219 research outputs found

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    Structural Summaries as a Core Technology for Efficient XML Retrieval

    Get PDF
    The Extensible Markup Language (XML) is extremely popular as a generic markup language for text documents with an explicit hierarchical structure. The different types of XML data found in today’s document repositories, digital libraries, intranets and on the web range from flat text with little meaningful structure to be queried, over truly semistructured data with a rich and often irregular structure, to rather rigidly structured documents with little text that would also fit a relational database system (RDBS). Not surprisingly, various ways of storing and retrieving XML data have been investigated, including native XML systems, relational engines based on RDBSs, and hybrid combinations thereof. Over the years a number of native XML indexing techniques have emerged, the most important ones being structure indices and labelling schemes. Structure indices represent the document schema (i.e., the hierarchy of nested tags that occur in the documents) in a compact central data structure so that structural query constraints (e.g., path or tree patterns) can be efficiently matched without accessing the documents. Labelling schemes specify ways to assign unique identifiers, or labels, to the document nodes so that specific relations (e.g., parent/child) between individual nodes can be inferred from their labels alone in a decentralized manner, again without accessing the documents themselves. Since both structure indices and labelling schemes provide compact approximate views on the document structure, we collectively refer to them as structural summaries. This work presents new structural summaries that enable highly efficient and scalable XML retrieval in native, relational and hybrid systems. The key contribution of our approach is threefold. (1) We introduce BIRD, a very efficient and expressive labelling scheme for XML, and the CADG, a combined text and structure index, and combine them as two complementary building blocks of the same XML retrieval system. (2) We propose a purely relational variant of BIRD and the CADG, called RCADG, that is extremely fast and scales up to large document collections. (3) We present the RCADG Cache, a hybrid system that enhances the RCADG with incremental query evaluation based on cached results of earlier queries. The RCADG Cache exploits schema information in the RCADG to detect cached query results that can supply some or all matches to a new query with little or no computational and I/O effort. A main-memory cache index ensures that reusable query results are quickly retrieved even in a huge cache. Our work shows that structural summaries significantly improve the efficiency and scalability of XML retrieval systems in several ways. Former relational approaches have largely ignored structural summaries. The RCADG shows that these native indexing techniques are equally effective for XML retrieval in RDBSs. BIRD, unlike some other labelling schemes, achieves high retrieval performance with a fairly modest storage overhead. To the best of our knowledge, the RCADG Cache is the only approach to take advantage of structural summaries for effectively detecting query containment or overlap. Moreover, no other XML cache we know of exploits intermediate results that are produced as a by-product during the evaluation from scratch. These are valuable cache contents that increase the effectiveness of the cache at no extra computational cost. Extensive experiments quantify the practical benefit of all of the proposed techniques, which amounts to a performance gain of several orders of magnitude compared to various other approaches

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed

    Highly anisotropic fluorine-based plasma etching of ultralow expansion glass

    Get PDF
    Deep etching of glass and glass ceramics is far more challenging than silicon etching. For thermally insensitive microelectromechanical and microoptical systems, zero-expansion materials such as Zerodur or ultralow expansion (ULE) glass are intriguing. In contrast to Zerodur that exhibits a complex glass network composition, ULE glass consists of only two components, namely, TiO2 and SiO2. This fact is highly beneficial for plasma etching. Herein, a deep fluorine-based etching process for ULE 7972 glass is shown for the first time that yields an etch rate of up to 425 nm min^-1 while still achieving vertical sidewall angles of 87°. The process offers a selectivity of almost 20 with respect to a nickel hard mask and is overall comparable with fused silica. The chemical surface composition is additionally investigated to elucidate the etching process and the impact of the tool configuration in comparison with previously published etching results achieved in Zerodur. Therefore, deep and narrow trenches can be etched in ULE glass with high anisotropy, which supports a prospective implementation of ULE glass microstructures, for instance, in metrology and miniaturized precision applications

    A major QTL controls susceptibility to spinal curvature in the curveback guppy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding the genetic basis of heritable spinal curvature would benefit medicine and aquaculture. Heritable spinal curvature among otherwise healthy children (<it>i.e. </it>Idiopathic Scoliosis and Scheuermann kyphosis) accounts for more than 80% of all spinal curvatures and imposes a substantial healthcare cost through bracing, hospitalizations, surgery, and chronic back pain. In aquaculture, the prevalence of heritable spinal curvature can reach as high as 80% of a stock, and thus imposes a substantial cost through production losses. The genetic basis of heritable spinal curvature is unknown and so the objective of this work is to identify quantitative trait loci (QTL) affecting heritable spinal curvature in the <it>curveback </it>guppy. Prior work with <it>curveback </it>has demonstrated phenotypic parallels to human idiopathic-type scoliosis, suggesting shared biological pathways for the deformity.</p> <p>Results</p> <p>A major effect QTL that acts in a recessive manner and accounts for curve susceptibility was detected in an initial mapping cross on LG 14. In a second cross, we confirmed this susceptibility locus and fine mapped it to a 5 cM region that explains 82.6% of the total phenotypic variance.</p> <p>Conclusions</p> <p>We identify a major QTL that controls susceptibility to curvature. This locus contains over 100 genes, including MTNR1B, a candidate gene for human idiopathic scoliosis. The identification of genes associated with heritable spinal curvature in the <it>curveback </it>guppy has the potential to elucidate the biological basis of spinal curvature among humans and economically important teleosts.</p

    Chemical composition and antimicrobial activity of Populus nigra shoot resin

    Get PDF
    The chemical composition of Populus nigra shoot resin has been investigated by chromatographic and spectroscopic methods. The analyses resulted in identification of 19 known compounds. The resin exhibited low activity against selected microorganisms.http://www.naturalproduct.ushb2016Microbiology and Plant Patholog

    A species-wide inventory of NLR genes and alleles in Arabidopsis thaliana

    Get PDF
    Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes

    A bacterial cysteine protease effector protein interferes with photosynthesis to suppress plant innate immune responses

    Get PDF
    The bacterial pathogen Pseudomonas syringae pv tomato DC3000 suppresses plant innate immunity with effector proteins injected by a type III secretion system (T3SS). The cysteine protease effector HopN1, which reduces the ability of DC3000 to elicit programmed cell death in non-host tobacco, was found to also suppress the production of defence-associated reactive oxygen species (ROS) and callose when delivered by Pseudomonas fluorescens heterologously expressing a P. syringae T3SS. Purified His 6 -tagged HopN1 was used to identify tomato PsbQ, a member of the oxygen evolving complex of photosystem II (PSII), as an interacting protein. HopN1 localized to chloroplasts and both degraded PsbQ and inhibited PSII activity in chloroplast preparations, whereas a HopN1 D299A non-catalytic mutant lost these abilities. Gene silencing of NtPsbQ in tobacco compromised ROS production and programmed cell death

    The Spin Structure of the Nucleon

    Full text link
    We present an overview of recent experimental and theoretical advances in our understanding of the spin structure of protons and neutrons.Comment: 84 pages, 29 figure
    corecore